OCR-Free Document Understanding Transformer
نویسندگان
چکیده
Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and holistic understanding of the document. Current Visual Document (VDU) methods outsource to off-the-shelf Optical Character Recognition (OCR) engines focus on with OCR outputs. Although OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility models languages or types documents; 3) error propagation subsequent process. To address these issues, in this paper, we introduce novel OCR-free VDU model named Donut, which stands transformer. As first step research, propose simple architecture (i.e., Transformer) pre-training objective cross-entropy loss). Donut conceptually yet effective. Through extensive experiments analyses, show model, achieves state-of-the-art performances various tasks terms both speed accuracy. In addition, offer synthetic data generator that helps be flexible domains. The code, trained are available at https://github.com/clovaai/donut .
منابع مشابه
Beyond OCR: Multi-faceted understanding of handwritten document characteristics
In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...
متن کاملBeyond OCR: Multi-faceted understanding of handwritten document characteristics
In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...
متن کاملBeyond OCR: Multi-faceted understanding of handwritten document characteristics
In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...
متن کاملBeyond OCR: Multi-faceted understanding of handwritten document characteristics
In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...
متن کاملBeyond OCR: Multi-faceted understanding of handwritten document characteristics
In the previous chapters, we proposed several features for writer identification, historical manuscript dating and localization separately. In this chapter, we present a summarization of the proposed features for different applications by proposing a joint feature distribution (JFD) principle to design novel discriminative features which could be the joint distribution of features on adjacent p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2022
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-19815-1_29